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Abstract — Total-variation (TV)-based CT image reconstruction 
has shown experimentally to be capable of producing accurate 
reconstructions from sparse-view data. In particular TV-based 
reconstruction is very well suited for images with piecewise 
nearly constant regions. Computationally, however, TV-based re- 
construction is much more demanding, especially for 3D imaging, 
and the reconstruction from clinical data sets is far from being 
close to real-time. This is undesirable from a clinical perspective, 
and thus there is an incentive to accelerate the solution of the 
underlying optimization problem. 

The TV reconstruction can in principle be found by any 
optimization method, but in practice the large scale of the 
systems arising in CT image reconstruction preclude the use 
of memory-demanding methods such as Newton's method. The 
simple gradient method has much lower memory requirements, 
but exhibits slow convergence. 

In the present work we address the question of how to reduce 
the number of gradient method iterations needed to achieve a 
high-accuracy TV reconstruction. We consider the use of two 
accelerated gradient-based methods, GPBB and UPN, to solve 
the 3D-TV minimization problem in CT image reconstruction. 
The former incorporates several heuristics from the optimization 
literature such as Barzilai-Borwein (BB) step size selection and 
nonmonotone line search. The latter uses a cleverly chosen 
sequence of auxiliary points to achieve a better convergence rate. 

The methods are memory efficient and equipped with a 
stopping criterion to ensure that the TV reconstruction has 
indeed been found. An implementation of the methods (in 
C with interface to Matlab) is available for download from 
http://www2.imm.dtu.dk/~pch/TVReg/ 

We compare the proposed methods with the standard gradient 
method, applied to a 3D test problem with synthetic few-view 
data. We find experimentally that for realistic parameters the 
proposed methods significantly outperform the gradient method. 



Index Terms — Total-variation, Gradient-based optimization. 
Strong convexity. Algorithms 



X 



I. Introduction 

Algorithm development for image reconstruction from in- 
complete data has experienced renewed interest in the past 
years. Incomplete data arises for instance in case of a small 
number of views, and the development of algorithms for 
incomplete data thus has the potential for a reduction in 
imaging time and the delivered dosage. 
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Total-varation (TV)-based image reconstruction is a promis- 
ing direction, as experiments have documented the potential 
for accurate image reconstruction under conditions such as 
few-view and limited-angle data, see e.g. (T). 

However, it also known that it is difficult to design fast 
algorithms for obtaining exact TV reconstructions due to non- 
linearity and nonsmoothness of the underlying optimization 
problem. Many different approaches have been developed, 
such as time marching |j2l, fixed-point iteration 13], and various 
minimization-based methods such as sub-gradient methods 
[4J, second-order cone programming (SOCP) [5 |, and duality- 
based methods ll6l, Q - but for large-scale applications 
such as CT image reconstruction the computational burden 
is still unacceptable. As a consequence heuristic and much 
faster techniques such as the one in 1 1 1 for approximating 
the TV solution have been developed. In such inaccurate, 
but efficient, TV-minimization solvers the resulting image 
depends on several algorithm parameters, which introduces an 
unavoidable variability. In contrast, for the accurate TV algo- 
rithms considered here, the resulting image can be considered 
dependent only on the parameters of the optimization problem. 

In this work we present two accelerated gradient-based 
optimization methods that are capable of computing the TV 
reconstruction of 3D volumes to within an accuracy specified 
by the user 

II. Theory 
A. Total-variation-based image reconstruction 

In this work we consider total-variation (TV)-based image 
reconstruction for computed tomography. The 3D reconstruc- 
tion is represented by the vector x* which is the solution to 
the minimization problem 

x* = argmin (t){x), 4>{x) = -\\A x — h\\\ + a\\x\\T^Y . (1) 

Here, x is the unknown image, Q is the set of feasible x, 
A is the system matrix, h is the projection data stacked into 
a column vector, and a is the TV regularization parameter 
specifying the relative weighting between the fidelity term and 
the TV term. ||x||tv is the discrete total-variation of x. 
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where N is the number of voxels and Dj is the forward 
difference approximation to the gradient at voxel j. 



B. Smooth and strongly convex functions 

We recall that a continuously differentiable function / is 
convex if 



Algorithm 1: GPBB 



.f{x)>f{v)+Vf{yf{x-y) 



input : x^°\ K 
output: a^C'+i) 

1 00 = 1 ; 

(3) 2 for /c = 0, 1, 2, . . . do 



for all x, y e Q. A stronger notion of convexity is strong 4 
convexity: f is said to be strongly convex with strong convexity 5 
parameter ji if there exists a > such that 



f{x) > fiy) + Vfiyfix -y) + - yg (4) 

for all X, y ^ Q. Furthermore, / has Lipschitz continuous 
gradient with Lipschitz constant L, if 



fix) < f{y) + Wfiyfix -y) + ^L\\x - y\\l 



(5) 
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// BB strategy 
if fc > then 

_ (a;('=)-a:('=-i),V/(x('=))-V/(a:('=-i))) ' 

/3 ^ 0.95 ; 

x4-Pq(x('=) -/36ifcV/(x('=))) ; 

/ ^ max{/(x('=)), /(xC^-D), . . . , /(a;^^-^))} 

while /(x) > /-crV/(x('=))'^(a;('^) -x) do 

; 

_ 5 ^ Pq(x('=) - /36lfeV/(xW)) ; 
(fc+i) ^ ^ . 



for all X, y E Q. The ratio /i/L is important for the 
convergence rate of gradient methods we will consider. 

The problem ([TJ can be shown |8| to be strongly convex 
and have Lipschitz continuous gradient in the case where A 
specifies a full-rank overdetermined linear system. In the rank 
deficient or underdetermined case, which occurs for instance 
for few-view data, the strong convexity assumption is violated. 
However, as we shall see, this turns out not to pose a problem 
for the gradient methods we consider. 

III. Algorithms 
A. Gradient projection methods 

The optimization problem ([TJ can, in principle, be solved 
by use of a simple gradient projection (GP) method 



(6) 



where Pq denotes projection onto the set Q of feasible x, and 
Ok is the step size at the fcth step. The worst-case convergence 
rate of GP with > and constant step size is 



(7) 



where Cqp is a constant ||9] §7.1.4]. 

For large-scale imaging modalities, such as CT, this slow 
convergence renders the simple gradient method impractical. 
On the other hand the simplicity and the low memory re- 
quirements of the gradient method remain attractive. Various 
modifications have been suggested in the optimization litera- 
ture. For instance, a significant acceleration is often observed 
empirically if the gradient method is equipped with a Barzilai- 
Borwein (BB) step size strategy and a nonmonotone line 
search lHU], IH, Oil, Ql, see Algorithm [T] GPBB 

for a pseudo-code. Empirically we have found K ^ 2 and 
a = 0.1 to be satisfactory parameter choices. However, it 
remains unproven that GPBB achieves a better worst-case 
convergence rate than (|7]i. 



B. Nesterov's optimal method 

Nesterov |fT5l proposed a gradient-based method that for 
given ij, > achieves the convergence rate 
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where Cn is a constant, and he proved the method to be 
optimal, i.e., that no gradient-based method can achieve better 
worst-case convergence rate on the class of strongly convex 
problems. 

Comparing (|7]i and (|8]l, we see how the ratio fi/L affects the 
predicted worst-case convergence rates: When /i/L decreases, 
both convergence rates become slower, but less in ([HJ due 
to the square root. We therefore expect Nesterov's method to 
show better convergence for smaller 1.1/ L. Small /i/L arise for 
instance when the number of views is small, see |8|. 

Nesterov's method requires that both /i and L are given 
by the user, and in order for the method to be convergent 
fi must be chosen sufficiently small and L sufficiently large. 
For real world applications such as CT, /i and L are seldom 
known, which makes the method impractical. Taking overly 
conservative estimates can depreciate the better convergence 
rate (|8]l; hence, accurate estimates of /i and L are important. 

C. Estimating fj, and L 

A sufficiently large L can be chosen using back-tracking 
line search lfT6l . ifTTl . see Algorithm [2] BT for pseudo-code. 
Essentially, an estimate Z of i is increased by multiplication 
with a constant pj^ > 1 until (|5]l is satisfied. 

Accurately estimating /i, such that (|4]) is satisfied globally, is 
more difficult. Here, we propose a simple and computationally 
inexpensive heuristic: In the fcth iteration choose an estimate 
fik as the largest value of that satisfies Q between x^*'-' and 
y'^'^K and make the /i^-sequence non-increasing: 

f{x)~fiy)-Vf{yr{x^y) 



fj,k = mm < fik-i 



Ux- 



(9) 



Algorithm 2: BT 

input : y,L 
output: X, L 

1 L^L ; 

3 While f{x) > f{y) + Vfiyfix - y) + '^L\\x - y\\l do 

4 L ^ plL ; 

5 l x^PQ{y-L-^Wf{y)) ; 



Algorithm 3: UPN 

input : x^°\fL, L 

output: a^C'+i) 
1 [x^^\La] ^ BT(a;(o),L) ; 
2fio = P, y^-^^ ^ x^^\ 6*1 ^ y/VA) ; 

3 for fc = 1, 2, . . . do 

4 [x('=+i),L,] ^BT(yW,Lfc_i) ; 

5 ^fc ^ min{/iA;_i, A/(a;('=),y(''^))} ; 

6 4- positive root of O"^ — (I — 0)9\ + {^ik/Lk) ; 



We call the Nesterov method equipped with estimation of /i 
and L Unknown Parameter Nesterov (UPN) and pseudo-code 
is given in Algorithm |3] UPN. 

Unfortunately, convergence of UPN is not guaranteed, since 
the estimate (|9| can be too large. However, we have found 
empirically that an estimate sufficient for convergence is 
typically effectively determined by (|9]). 

It is possible to ensure convergence by introducing a restart- 
ing procedure 1 8 1 at the price of lowering the convergence rate 
bound and thereby losing optimality of the method. However, 
we have found empirically that the restarting procedure is sel- 
dom needed, and for realistic parameters the simple heuristic 
(|9]l is sufficient. 

D. Stopping criterion 

For an unconstrained convex optimization problem such as 
([TJ the norm of the gradient is a measure of closeness to the 
minimizer through the first-order optimality conditions fT8|. 
For a constrained convex optimization problem it is possible 
to express a similar optimality condition, namely in terms of 
the gradient map defined by 

G,{x) = u{x~PQ{x^u-'Vfix))), (10) 

where > is a scalar. A point x* is optimal if and only if 
G,^{x*) — for any ly > ifTTl . We can use this to design 
a stopping criterion: Stop the algorithm after iteration k if 
||G,.(xW)||2/A^ < e, where e is a user-specified tolerance. 

For an under-determined problem, e.g. in the few-view case, 
the objective function in ([T]) is nearly flat at the minimizer, 
which makes it difficult to determine when a sufficiently 
accurate reconstruction has been found. Here, the gradient map 
provides a simple, yet sensitive, stopping criterion. 



IV. Simulation results and discussion 

A. Simulation setup 

At this point we emphasize that our objective is to obtain 
an accurate TV reconstruction while reducing the required 
number of gradient method iterations. We include two recon- 
structions merely to demonstrate that the methods indeed are 
successful in solving ([T]i, thereby reconstructing the desired 
image. In [8| dependence of the convergence with respect to 
parameter variation is explored. 

To demonstrate and compare the convergence of GP, GPBB 
and UPN we set up a simple test problem. As test image 
Xtruo we use the threedimensional FORBILD head phantom 
discretized into 64'^ voxels. We simulate a parallel beam 
geometry with view directions evenly distributed over the 
unit sphere. Projections are computed as the forward mapping 
of the discretized image subject to additive Gaussian white 
noise e of relative magnitude ||e||2/|| Aa;truc||2 = 0.01, i.e., 

b = Axtruc + e- (11) 

We enforce nonnegativity by taking Q — M.^ . We consider 
two reconstructions: A "many-view" using 55 views and a 
"few-view" using only 19 views of size 91^ pixels. In the 
latter case A has less rows than columns, which can be shown 
m to lead to violation of the assumption on strong convexity. 
The iterations are continued to the tolerance e = 10~® is met. 

B. Simulation results 

Fig. [T] shows the middle (33rd) axial voxel slice through the 
original 3D volume together with many-view and few-view 
UPN reconstructions using a = 0.01. Both reconstructions 
reproduce the orignal features accurately, except for two small 
features are missing in the few-view reconstruction. Fig. |2] 
shows the convergence of the three methods in terms of 
objective value ^^^^^ relative to the true minimal objective value 
(j>* as function of the iterations k. As (p* is unknown, we have 
approximated it by computing the UPN solution for an e two 
orders of magnitude lower than the value used in the iterations. 

In both cases we see that UPN converges to a satisfactory 
accuracy within 2000 iterations, whereas GP does not, and 
GPBB only does in the former case. In the many-view case 
UPN and GPBB both produce a significant (and comparable) 
acceleration over GP. In the few-view case, we also observe 
acceleration for both, but UPN stands out with much faster 
convergence. This is in accordance with the expectation stated 
in Section HITB] 

The adequacy of the stopping criterion is evaluated by a 
simple visual comparison of the few-view simulation gradient 
map norm decay (Fig. |2] right) and the objective decay 
(Fig. [2] center). Apart from the erratic decay for GPBB (which 
is caused by highly irregular step length selection) there is 
a pronounced correspondence, and we therefore consider the 
stopping criterion effective. 

Although UPN was designed for strongly convex problems, 
we conclude that the method also works in the non-strongly 
convex case of having few-view data - in fact, from the 
preliminary results the non-strongly convex case is where 





Fig. 2. Convergence histories. Left: Many-view simulation. Center: Few-view simulation. Right: Gradient map norm histories for few-view simulation. 



UPN shows its biggest potential by exhibiting a much faster 
convergence than GP and GPBB. 

V. Conclusion 

We have described the gradient-based optimization methods 
GPBB and UPN and their worst-case convergence rates. Our 
simulations show that both algorithms are able to significantly 
accelerate high-accuracy TV-based CT image reconstruction 
compared to a simple gradient method. In particular UPN 
shows much faster convergence when applied to few-view 
data. The software (implementation in C with an interface 
to Matlab) is available from http://www2.imm.dtu.dk/~pch/ 
|TVReg/| 
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